Shaping Proto-Value Functions Using Rewards

نویسندگان

Raj Kumar Maity

Chandrashekar Lakshminarayanan

Sindhu Padakandla

Shalabh Bhatnagar

چکیده

In reinforcement learning (RL), an important sub-problem is learning the value function, which is chiefly influenced by the architecture used to represent value functions. Often, the value function is expressed as a linear combination of a pre-selected set of basis functions. These basis functions are either selected in an ad-hoc manner or are tailored to the RL task using the domain knowledge. Selecting basis functions in an ad-hoc manner does not give a good approximation of value function while choosing functions using domain knowledge introduces dependency on the task. Thus, a desirable scenario is to have a method to choose basis functions that are task independent, but which also provide a good approximation for the value function. In this paper, we propose a novel task-independent method to construct reward-based Proto Value Functions (RPVFs) using the topology of the state space and the reward structure of the underlying RL task. Our methodology uses the connectivity of the state space and the immediate reward structure to construct the basis functions which are required for linear approximation of the value function. The approach we propose gives enhanced learning performance. In particular, when the state space is symmetrical and the value function asymmetrical, the basis functions so constructed capture the asymmetry in value function better than any of the previous approaches. We demonstrate the effectiveness of RPVFs in approximating the value function via experiments on benchmark RL problems as well as on another non-standard problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shaping Proto-Value Functions via Rewards

Learning value function is an important sub-problem in solving a given reinforcement learning task. The choice of representation for the value function directly affects learning. The most widely used representation for the value function is the linear architecture, wherein, the value function is written as a linear combination of a ‘pre-selected’ set of basis functions. In such a scenario, choo...

متن کامل

Potential-based difference rewards for multiagent reinforcement learning

Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...

متن کامل

Evolved Intrinsic Reward Functions for Reinforcement Learning

The reinforcement learning (RL) paradigm typically assumes a given reward function that is part of the problem being solved by the agent. However, in animals, all reward signals are generated internally, rather than being received directly from the environment. Furthermore, animals have evolved motivational systems that facilitate learning by rewarding activities that often bear a distal relati...

متن کامل

Imitation in Reinforcement Learning

The promise of imitation is to facilitate learning by allowing the learner to observe a teacher in action. Ideally this will lead to faster learning when the expert knows an optimal policy. Imitating a suboptimal teacher may slow learning, but it should not prevent the student from surpassing the teacher’s performance in the long run. Several researchers have looked at imitation in the context ...

متن کامل

Imitation Learning with Demonstrations and Shaping Rewards

Imitation Learning (IL) is a popular approach for teaching behavior policies to agents by demonstrating the desired target policy. While the approach has lead to many successes, IL often requires a large set of demonstrations to achieve robust learning, which can be expensive for the teacher. In this paper, we consider a novel approach to improve the learning efficiency of IL by providing a sha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Shaping Proto-Value Functions Using Rewards

نویسندگان

چکیده

منابع مشابه

Shaping Proto-Value Functions via Rewards

Potential-based difference rewards for multiagent reinforcement learning

Evolved Intrinsic Reward Functions for Reinforcement Learning

Imitation in Reinforcement Learning

Imitation Learning with Demonstrations and Shaping Rewards

عنوان ژورنال:

اشتراک گذاری